Zone Content Classification and its Performance Evaluation

نویسندگان

  • Yalin Wang
  • Robert M. Haralick
  • Ihsin T. Phillips
چکیده

This paper presents an improved zone content class$cation method and its performance evaluation. We added two new features to the feature vector from one previously published method [l]. We assumed different independence relationship in two zone sets. We used an optimized binary decision tree to estimate the maximum Zone content class probability in one set while used Viterbi algorithm to find the optimal solution for a zone sequence in the other set. The training, pruning and testing data set for the algorithm include 1,600 images drawn from the UWCDROMIII document image database, The class$er is able to classify each given scienti3c and technical document zone into one of the nine classes, 2 text classes (of font size 4 18pt and font size 19 32 pt), math, table, halfrone, map/drawing, ruling, logo, and others. Compared with our previous work [2], it raised the accuracy rate to 98.52% from 97.53% and reduced the mean false alarm rate to 0.53% from 1.26%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document zone content classification and its performance evaluation

This paper describes an algorithm for the determination of zone content type of a given zone within a document image.We take a statistical based approach and represent each zone with 25 dimensional feature vectors. An optimized decision treeclassifier is used to classify each zone into one of nine zone content classes. A performance evaluation protocol is proposed.The training and t...

متن کامل

Document Zone Content Classification Using Decision Tree and HMM

A document can be divided into zones on the basis of its content. For example, a zone can be either text or non-text. This paper describes an algorithm to classify each given document zone into one of nine different classes. Foreground and background features are studied. We used an optimized binary decision tree to estimate the maximum zone content class probability in one set while used Viter...

متن کامل

A Study on the Document Zone Content Classification Problem

A document can be divided into zones on the basis of its content. For example, a zone can be either text or non-text. Given the segmented document zones, correctly determining the zone content type is very important for the subsequent processes within any document image understanding system. This paper describes an algorithm for the determination of zone type of a given zone within an input doc...

متن کامل

A Method for Document Zone Content Classification

This paper describes an algorithm to classify each given document zone into one of nine classes and provides a protocol for its performance evaluation. The classification scheme uses an optimized binary decision tree and Viterbi algorithm for HMM to find the optimal solution. Our algorithm was trained and tested on a total of 24,177 zones within the 1600 images from UWCDROM III database. Its ac...

متن کامل

Page Layout Classification Technique for Biomedical Documents

The structural layout information of scanned document pages is valuable for a wide range of document processing applications such as automatic document searching, document delivery and automated data entry. This paper describes the classification of scanned document pages into different classes of physical layout structures. The page layout classification technique proposed in this paper uses a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001